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FOREWORD 


Density  estimation  plays  a  central  role  in  probabilistic  pattern  recognition  and  signal  pro¬ 
cessing.  As  data  sets  get  larger,  the  cost  of  identifying  a  definitive  class  with  each  observation  can 
become  prohibitive.  Instead,  it  becomes  important  to  develop  ways  to  process  the  data  in  ways 
that  make  use  of  all  available  information. 

This  work  was  done  under  the  joint  support  of  the  NSWCDD  Independent  Research  pro¬ 
gram  and  the  Office  of  Naval  Research  (R&T  #44243 14). 
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ogy  Group,  and  Dr.  Glenn  Moore,  Head,  Naval  Surface  Warfare  Center  Independent  Research 
Program. 


Approved  by: 


MARY  E.  LACEY,  Head 

Systems  Research  and  Technology  Department 


iii/iv 


NSWCDDyTR-95/122 


CONTENTS 

Page 


INTRODUCTION .  1 

JOINT  REPRESENTATION  MIXTURE  MODELS .  2 

FINTTE  MIXTURE  MODELS .  2 

JOINT  REPRESENTATION  MIXTURE  MODELS .  2 

JOINT  REPRESENTATION  MIXTURE  MODEL  LIKELIHOOD  FUNCTIONS . .  3 

MAXIMUM  LIKELIHOOD  E-M  EQUATIONS .  6 

DERIVATION  OF  RECURSIVE  UPDATE  EQUATIONS  FOR  JOINT 

REPRESENTATION  MODEL .  8 

RESULTS .  18 

SUMMARY  OF  UNIVARIATE  ITERATIVE  E-M  UPDATE  EQUATIONS .  18 

SUMMARY  OF  UNIVARIATE  RECURSIVE  UPDATE  EQUATIONS .  20 

CONCLUSIONS .  23 

REFERENCES .  24 

DISTRIBUTION .  (1) 


v/vi 


NSWCDD/TR-95/122 


INTRODUCTION 


Finite  mixture  models  have  proven  to  be  quite  flexible  as  parametric  probability  density 
function  estimators.  ’  Recently  an  adaptive  mixture  model  was  presented  whose  complexity  or 
number  of  terms  is  determined  in  a  data  driven  manner.^  This  approach  has  made  possible  the  use 
of  mixture  models  within  a  semiparametric  setting,  and  thus  of  much  more  general  applicability/ 
utility  than  was  possible  under  rigid  parametric  assumptions. 

This  semiparametric  use  of  mixture  models  has  resulted  in  efforts  to  develop  alternative 
adaptive  mixture  model  algorithms. Recent  applications  of  semiparametric  mixture  model  den¬ 
sity  estimation  can  be  found  in  References  6  through  11.  Thus,  in  addition  to  the  traditional  para¬ 
metric  uses  of  mixture  models,  the  semiparametric  application  of  mixture  models  is  now  well 
established. 


One  of  the  problems  that  arises  in  many  applications  of  mixture  models  to  density  estima¬ 
tion  of  large  scale  data  sets  is  that,  as  the  size  of  the  data  set  increases,  the  class  labeled  data 
becomes  a  (small)  subset  of  the  total  data  set;  that  is,  while  many  small  data  sets  may  have  all  the 
observations  labeled  as  to  class  membership,  large  data  sets  often  consist  of  labeled  subsets  plus  a 
potentially  large  unlabeled  subset. 

The  reason  for  this  can  be  illustrated  with  an  image  processing  example.  Suppose  that  fea¬ 
tures  are  to  be  computed  for  each  pixel  for  a  number  of  images  and  that  densities  are  to  be  com¬ 
puted  for  each  class.  Depending  on  the  problem,  the  classes  may  correspond  to  vehicles, 
buildings,  woods,  and  open  terrain,  or  to  tumorous  and  nontumorous  tissue.  If  all  the  available 
data  is  to  be  used,  the  work  in  allocating  each  original  pixel  to  one  of  the  classes  can  easily 
become  prohibitive.  The  more  usual  case  is  that  only  a  representative  subset  of  training  data  are 
class  labeled  with  the  balance  either  uncategorized  or  partially  categorized.  An  example  of  the  lat¬ 
ter  case  is  that  it  may  be  easy  to  say  that  there  are  no  vehicles  in  this  image,  no  buildings  in  that 
one,  and  so  on  but  very  difficult  or  time  consuming  to  identify  each  pixel  corresponding  to  each 
class  in  each  image.  It  is  often  the  case  in  medical  imagery  that  ground  truth  cannot  be  established 
definitively  without  a  biopsy,  again  leading  to  less  than  full  categorization  of  the  observations. 

Thus  it  is  desirable  to  have  a  unified  framework  for  handling  this  combined  supervised 
(class  labeled  dataVunsupervised  (unlabeled  data)  problem.  This  was  the  motivation  behind  the 
development  of  joint  representation  mixture  models. When  dealing  with  large  data  sets,  by 
which  we  mean  100,000  observations  or  more,  the  iterative  expectation-maximization  (E-M) 
equations  often  become  impractical.  One  method  of  dealing  with  this  much  data  is  to  go  to  recur¬ 
sive  formulations  of  the  E-M  equations.  This  approach  also  makes  possible  the  implementation  of 
the  adaptive  mixure  model  approach.^  The  derivation  of  the  recursive  E-M  equations  for  joint 
representation  mixture  models  is  the  focus  of  this  report. 


1 


NSWCDD/TR-95/122 


JOINT  REPRESENTATION  MIXTURE  MODELS 

FINITE  MIXTURE  MODELS 

Given  a  probability  density  function  that  can  be  represented  as  a  finite  (g  term)  mixture 

model 


8 

i=  1 

where  /  ( •  1 0)  denotes  a  generic  member  of  the  chosen  parametric  family,  the  likelihood  function 
for  n  observations  is  given  by 


n  8 

^(¥)  =  n  s  ^2) 

j=U=\ 

The  vector  0,  represents  the  parameter  set  for  the  ith  mixture  component  while  \|/  represents  the 
combined  total  parameter  set  including  the  mixing  coefficients  TCj.  The  log-likelihood  function  is 


n 

lnL{\f)  = 

j=l 


Li=i 


(3) 


The  maximum  likelihood  update  equations  can  be  obtained  by  taking  derivatives  of  the  log-likeli¬ 
hood  function  with  respect  to  the  mixture  model  parameters,  setting  the  resulting  expressions 
equal  to  zero,  and  solving  for  the  parameters. 


JOINT  REPRESENTATION  MIXTURE  MODELS 

Consider  the  Joint  Representation  Mixture  Model  defined  by 

g  M  g  M 

P(^|¥)  =  p  (term  i) p  (xlterm  i)  /?( class  m| term  i)  =  ^  (4) 

1  =  1  m  =  1  1=1  m  =  1 

where  =  p  (class  m|term  i)  is  an  intra-term  class  mixing  coefficient  that  gives  the  relative 
proportion  of  the  ith  term  associated  with  the  mth  class  with  the  constraint 

M  M 

X  ^im  =  X  ^  w  I  term  i)  =  1 .  (5) 

m  =  \  m  =  1 
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This  constrEint  nierely  says  thst  for  each  term  independently,  the  class  mixing  coefficients  must 
sum  to  one,  or  equivalently,  that  an  observation  from  term  i  must  belong  to  one  of  the  M  classes 
with  probability  one. 

The  mixture  model  defined  in  Equations  (4)  and  (5)  represents  a  significant  departure  from 
traditional  mixture  model  usage.  Historically  a  single  mixture  model  has  been  used  for  either  per¬ 
forming  unsupervised  clustering  or  to  generate  a  probability  density  function  for  observations 
from  a  single  class.  If  observations  from  multiple  classes  are  to  be  dealt  with,  then  a  separate  mix¬ 
ture  model  is  developed  for  data  from  each  class.  This  latter  approach  leaves  open  the  question  of 
how  to  incorporate  partially  (class)  categorized  or  uncategorized  observations  when  there  are  sep¬ 
arate  mixture  models  for  each  class.  As  will  be  seen,  the  joint  representation  formulation  leads  to 
a  unified  treatment  of  these  cases. 


JOINT  REPRESENTATION  MIXTURE  MODEL  LIKELIHOOD  FUNCTIONS 
The  likelihood  function  for  class  categorized  data  is 


M 

=  n  n  {  [p  (^,1  (vn  Class 

7  =  1  m  =  1 
M 


nn 


y  =  1  m  =  1  L|  =  1 


(6) 


Here,  zjfn  is  a  binary  valued  class  indicator  function.  For  observation  j  from  class  h,  zjfi  =  1  and  Z  f^ 
=  0  for  m^h.  It  thus  can  be  considered  as  a  picking  function.  It  is  used  to  pick  out  the  desired  con¬ 
tribution  to  the  likelihood  function.  For  each  term  in  the  product  where  it  has  the  value  zero,  the 
contribution  to  the  product  is  one  so  that  the  likelihood  is  unaffected.  The  log-likelihood  function 
for  class  categorized  data  can  be  written 


M 

'^In- 

J=  lm=  1  '■  Lj  =  1 

M  r  g  1 

=  X  X  V" 

j=lm=l  ^ 

g  M 

=  X^”X  X 

j=l  i=lm=l 


(7) 


To  write  down  the  likelihood  for  partially  (class)  categorized  data,  first  consider  the  likeli¬ 
hood  appropriate  for  the  case  where  the  data  is  both  class  and  term  categorized.  In  this  case,  the 
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likelihood  is 


nc  M  g 

(v)  =  n  n  n  { ■  <«) 

j=  lm=  li=  1 

Where  as  before,  zj^  (zp)  is  a  binary  valued  class  (term)  indicator  function.  For  observation  j 
from  class  h,  zjh  =  1  and  zj^ = 0  for  m^h,  while  for  observation  j  from  term  k,  zjk  =  1  and  zji = 0  for 
i^k. 


In  the  absence  of  complete  knowledge  of  zjm  and/or  zjt  the  usual  procedure  is  to  use 
expected  values  for  either/both  as 


=  ^jm 

and 


(9) 


=  <r 

Then  for  partially  (class)  categorized  data,  the  likelihood  is 


whence 


Lp(W) 


g  M  „ 

n  n  n  { 


y=l/=l/n=l 


(10) 


(11) 


%  g  M 

InLpiMf)  =  III 

y  =  1,  =  lm  =  1 

np  g  M  p 

=  1 1  I )]''■*} 

j=  li=  lm=  1 


(12) 


where  Is  a  prior  or  expected  probability  of  class  membership  with 


and 


M 

I  lijJ  =  1. 

/n  =  1 

^  n/(xy,e.) 

g 

I  "/(a:/9,) 

i  =  1 


(13) 


(14) 


is  the  expectation  (posterior  probability)  that  the  jth  observation  came  from  the  ith  mixture  term. 
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Since  this  is  an  expectation,  it  will  be  held  fixed  while  taking  derivatives.  Two  common  methods 
for  specifying  partial  categorization  are  (1)  to  give  prior  probabilities  for  each  class  for  observa¬ 
tion  j,  or  (2)  to  specify  that  the  priors  are  zero  for  some  subset  of  the  classes  and  to  use  posterior 
probabilities  across  the  remaining  possible  classes  for  the  unknown 

For  uncategorized  data. 


=  n 


g  M 


./=  lm=  1 


=  n  z  . 


(15) 


y  =  1 J  =  1 


8  M 

j=\  i  = 1 m  =  1 


=  'Ll"  1,  1  ■ 

7=1  i=l 


For  combined  categorized/partially  categorized/uncategorized  data. 


(16) 


lnL(\\f)  =  InL^ix^f)  +lnLp(\^)  +lnL^(\^)  (17) 

or 

"c  8  M 

faL(V)  =  X'"Z  1 

7=1  i=lm=l 

%  M  g  g 

+  Z  Z  Z  ^7mV"  +  Z  Z  ^/(^7;e,)  • 

7=lm=l»=l  y=l  ,=  1 

Historically  with  mixture  models,  reference  to  categorization  of  data  has  been  with  respect 
to  which  term  of  the  nuxture  model  the  observation  is  associated.  While  this  is  logical  when  each 
term  is  ascribed  a  class  status  as  in  clustering,  in  this  work,  a  completely  different  definition  of 
categorized  data  is  being  used.  In  this  case,  the  concern  is  that  of  categorizing  data  only  with 
respect  to  class  membership  rather  than  with  respect  to  individual  mixture  model  terms. 

To  derive  the  maximum  likelihood  update  equations,  the  parameter  values  that  give  a 
maximum  of  the  log-likelihood  function  must  be  found.  This  can  be  accomplished  by  taking  the 
derivative  with  respect  to  each  parameter,  setting  it  equal  to  zero,  and  solving  the  resultant  system 
of  equations  for  the  parameters.  This  has  been  done  previously*^  so  that  those  results  will  be 
taken  as  the  starting  point  in  developing  the  recursive  versions  here. 
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MAXIMUM  LIKELIHOOD  E-M  EQUATIONS 

The  joint  representation  E-M  equations  are^^ 


where 


c,-™  = 


1  S 

.y=lm=l  j=l  . 


n„ 


-7=1  7=1 


(19) 


71.  = 


‘  (/7,  +  n^  +  n„) 


■ 

-7=1  7=1  7=1  J 


(20) 


I  [  vJ  +  S  [<^]  ^  S  fe] 


ii.  =  — 

r  n. 


.1^ 


I K]  +  Z  Kl  ^  I K] 

L7=1  7=1  7=1 


(21) 


2,,  = 


Z  <7 ^  ^  +  S ^  ■ 

_ t^J _ L=J _ 


"c 

Lj=l  j=l  j=\  . 


(22) 


and 


M 


<J=  s 


X;;^  = 


ijm  g  M 

X  X  ^7m^«/n"/(^7’®«) 

/=  lm=  1 


M 


T..  = 

/ym  g 


m=  1 


g  M 


m=  1 


1  =  1 


i=  l/n=  1 


(23) 


(24) 
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Similarly 


U 

^iJ 


(25) 


where  it  is  to  be  remembered  that  x^.  is  only  computed  when  Xj  is  a  partially  categorized  observa¬ 
tion  and  similarly  for  x“.. 

The  E-M  algorithm  then  consists  of  iterating  the  expectation  step  consisting  of  evaluating 
Equations  (23),  (24),  and  (25)  for  the  appropriate  observations  and  the  maximization  step,  which 
consists  of  evaluating  new  parameter  values  using  Equations  (19)  through  (22). 

The  multivariate  versions  can  be  obtained  by  making  Xj  and  jx,-  vector  quantities  and  E,-  a 
matrix.  The  equations  for  |i,  and  Z,  become 


i  *  ±  m  -  i  m 


1^ 


1±L 


L7=1  j=l  j=l 


(26) 


and 


=  i±l 


i 

-=J - ^ - L=J _ i=i _ 


•;=l  y=l  ;=1  . 


(27) 


where  the  component  indices  are  denoted  by  superscripts. 

Finally,  once  the  joint  representation  mixture  model  has  been  obtained  based  on  any  com¬ 
bination  of  class  categorized,  partial  class  categorized,  and  uncategorized  data,  if  desired  the 
probability  density  function  for  an  individual  class  can  be  obtained  through 
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p(;c|Classy,\|f)  =  - .  (28) 

Z  ^iy^i 

i=  1 

This  gives  a  properly  normalized  mixture  model  density  estimate  for  an  individual  class. 

These  results  serve  as  the  starting  point  for  the  derivation  of  the  recursive  update  equa¬ 
tions. 


DERIVATION  OF  RECURSIVE  UPDATE  EQUATIONS  FOR  JOINT  REPRESENTATION 
MODEL 


Recursive  Update  Equation  for  C 

Consider  first  *’  ^  which  implies  that  the  latest  observation  is  class  categorized.  In 

terms  of  the  E-M  expression. 


(n<.+  l,«P 


n,  +  1 

c  p 

y  X..  +  y  t  x^. 

Ijm  ^jm  IJ 

/=!  ./•=1 _ 

n,+  l 

j=l  j=l 


or,  equivalently 


(n,  +  1,  Bp) 


_ LSJ _ LZJ _ 

C  p 

C  C  -P 

j=i  ;=1 


(29) 


(30) 


The  right-hand  side  of  this  equation  can  be  broken  into  two  terms,  corresponding  to  the  last  obser¬ 
vation  and  all  previous  observations,  respectively. 
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(«<:+  l.«„) 


im 


i{n^  +  l)m 


;=i  7=1  - 


+  r. 


where  the  second  term  is  given  by 


T2  = 


— _ L^J _ 

7=1  7=1  - 


This  can  be  rewritten  as 


^=1  7=1 

■  ”c  «p  ' 

X4-^X4 

/  =  1  ;•  =  1 

«C  «p 

v,-i)  +  X4+X< 

X4-X<- 

L  y  =  1  y  =  1  - 

-7=1  7=1  -I 

which,  with  a  minor  rearrangement  of  terms,  becomes 


r  "c  «p  -1 

- «p 

x<>^x<. 

II 

11 

7=1  ;•  =  1 

"p 

"c  «p 

v,-n)  +  X%^X^- 

X4^X<, 

L  y  =  1  y  =  1  - 

L  y  =  1  y  =  1 

The  last  term  in  brackets  on  the  right-hand  side  is  just  ,  so  that 


(31) 


(32) 


(33) 


(34) 


1 - 

C 

''^'K+i) 


C 


c 


7=1  -I 


(35) 
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Then 


(w^+l,  n_) 


im 


C  ^  («c-  ”p) 

^i{n^+l)m~'^i(n^+l)  ^im 

^im  ■'■  7^  7p 

;  =  1  ;■  =  1 


(36) 


Using  the  identity 


M 


m=  1 


the  final  expression  can  be  written  as 


_  K,np) 
~  ^l/n 


«K+1)  J 

7^  7^ 

;=1  y=l 


(37) 


Similarly,  for  a  partially  categorized  observation. 


('*c."p+  1) 


im 


r  n„  +  l 

c  p 

y  T..  +  y  1 

^  ijm  Zu  y 

J=1  7=1 

«c 

^np  +  1,  np  +  1  +  2  '^ym  ■*■  2  ^jtrAj 

7=1  7=1 

«c  "p+l 

II 

II 

_J 

“1 

II 

II 

_1 

(38) 


or,  as  before 


C- 


(«c>«p+  1) 


j=l  j=l  ■ 


+  7’. 


(39) 
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with 


T  = 
^2 


-J>  1 _ J=l 

7=1  7=1  - 


Proceeding  as  before  in  the  completely  categorized  case, 


^2  = 


_ /=i 

7=1  7=1  - 


14*  S4 


-t^l _ i=l 


14*14 

7=1  7=1 


or,  equivalently, 


T-y  =  C 

2 


("c  «J 


im 


4»,*.-4-.,..*I4*2:4 

- /=!  /=! 


<,,..*14*14 

7=1  7=1 


This  is  readily  seen  to  be 


7’2  =  C 


(«c-  «p) 


1- 


tf 


«.  »p  + 1 


<,,..*  S  4*  S  4 

7=1  7=1  -I 


which  leads  to  the  final  result  for  a  partially  categorized  observation 
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’im  “  ^im 


Vil 


+ Z4^  14 

y=l  ;=1  - 


^i/n  J 


(44) 


Recall  that  no  update  takes  place  for  an  uncategorized  observation. 


Recursive  Update  Equation  for  tt 

(«^  +  1,  n,) 

Consider  first 

In  terms  of  the  E-M  expression, 


,  which  implies  that  the  latest  observation  is  class  categorized. 


It. 


(n^+  l,w  n„) 


■«c+l  «p  K 

_ /=! 

(n^+1)  +np  +  n^ 


(45) 


or,  equivalently 


7t. 


(n^+  l,n  ,n„) 


C  c  P  w 

_ _ i=l 

(n^+  1)  + 


(46) 


The  right-hand  side  of  this  equation  can  be  broken  into  two  terms,  corresponding  to  the  last  obser¬ 
vation  and  all  previous  observations,  respectively. 


(n^+  !,«,«„) 


71. 


(n^+  1)  +  n  +«„ 


S4^S4^S4 

_ /=!  7=1 

(n^+  1)  +«„  +  «„ 


(47) 


This  can  be  rewritten  as 
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7C. 


(«C+  l>”p>  «!,) 


c 


(n  +  1)  +n„  +  w 


p  u  c 


K  +  1)  +"p  +  «„. 


{«,,  rip,  nj 


Tt: 


(48) 


which  becomes 


(n, +  !,«,«„) 

7t,.  =  It.  + 


(n^  +  n  +nj  +  l 


K(n,.l)-^-  J-  (49) 


(w^,  n  +i,nj 

Consider  next  ,  which  implies  that  the  latest  observation  is  partially  class 

categorized.  In  terms  of  the  E-M  expression. 


7t. 


/tp  +  1,  «„) 


rip+l 


i^J _ 


J=  1 


"c+  ("p+1)  +«. 


or,  equivalently 


Jt. 


(n^,  n  +  1,  n„) 


- L±J _ /=!  /=! 

”c+  (”p+l)  +«« 


This  readily  becomes 


xf, 


'  "c+  ("p+1)  +««^L"c+  ("p+1) +v 


71.. 


V  «„) 


which  can  be  rewritten  as 


(50) 


(51) 


(52) 


{n^,n  +\,n)  (n,n,n) 

n.  =  n.  '^  + 


i _ _  K-V"-)] 

,  +  nJ  +  lL^‘(«P+l)  J 

For  uncategorized  observations,  it  is  easy  to  show  that  the  correct  expression  is 


(n  +n„  +  n  ) 

^  c  p 


(53) 
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(.n„n  ,n^+l)  «/»«»)  . 

71,.  =71,.  +7- 

‘  ‘  (n 


1  ~  u  ( 

(n-  +  n_  +  n..)  +  1 


c  p  u' 


Recursive  Update  Equation  for  1 


Consider  first  ^  ,  which  implies  that  the  latest  observation  is  class  categorized. 


In  terms  of  the  E-M  expression, 


{«,.+  l,n  ,«„) 


I  [<//]  ^  S  [<//]  ^  S  K*/] 


«.  +  1 


y  =  1  y  =  1  y  =  1 


or,  equivalently 


(«,,+  l,n  ,n„) 


*  S  [<//]  ^  I K-J  - 1  [vJ 


y=i _ 


2; 

L  ;=i  y=i  y=i  J 

The  right-hand  side  of  this  equation  can  be  broken  into  two  terms,  corresponding  to  the  last  obser¬ 
vation  and  all  previous  observations,  respectively. 


(n,,+  l,/i  ,n„) 


S  K-/]  ^  S  K-/] "  S  K-/] 


"c+1 

S4 

It 

II 

II 

II 

_J 

II 

II 

This  can  be  rewritten  as 
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(n^+ _ 

~  w,  +  1 


c 


y=i  j=i  j=i 


_ /  =  1  /  =  1 

”c+l  «p 

j=i  j=i  j=i  ■ 


(58) 


which  becomes 


^•(«c+i)  r 

/z^  +  1 

”p 

L 

7=1 

7=1 

7=1 

("c>  «p>  «»)■ 


(59) 


in^,  n  +i,n^> 

Consider  next  ,  which  implies  that  the  latest  observation  is  partially  class  catego¬ 

rized.  In  terms  of  the  E-M  expression. 


("c>"p+ 


n„+  1 


I  fe]  -  s  m  *  1 


j=  1 


«p+i 


;=1  J=1  7=1 


or,  equivalently 


(60) 


(n<,,n  +  1,«„) 


J  =  1 


n,  Hp+l  n„ 

7=1  7=1  7=1 


(61) 


this  readily  becomes 
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(n^,n  +l,nj 


‘'J(n.+  n 


«c  "» 


=  1  /•=1  /=1 
M  M  4-  1  w  r*"/ 


«c  «P  +  1 


s<^s<>^s<y 

;=1  j=\  j=\  L.;=l  ;=1  7=1  - 


which  becomes 


(n^,  Wp  +  1,  nj  _  (n^,  n^,  n„) 

+ 


_ ^Knp+l) _ r  _  («c>  «p>  "p)] 

Wp+l  M„  L^'(”p+1)  J 

7=1  7=1  7=1 


For  uncategorized  observations  it  is  easy  to  show  that  the  correct  expression  is 


^l.  =H,'  '  + 


''/■(«„+ 1) 


«p  «„  + 1 


h(».* 


(”p>  «p>  "») 


'  .  (64) 


7=1  7=1  7=1 


Recursive  Update  Equation  for  £ 

(.n^  +  1,  n  ,  /tp) 

Consider  first  ,  which  implies  that  the  latest  observation  is  class  categorized. 

The  E-M  expression  is 


,(np+  l,n  ,n„) 


S  *  S  +  S 


/•  =  1 _ i 

«c  +  1  «p  «p 

7=1  7=1  7=1 


or,  equivalently 
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“  ~  "c+l  "p  «« 

■-y  =  1  ;■  =  1  ;■  =  1  - 


S  +  S  K-Uj-n,)']  +  E 

i^i _ L^J _ _ 

«c+l  «p  n, 

;=1  ;=1  y=l 


(66) 


This  can  be  rewritten  as 


n,+  l 

C 


j=i  y  =  1  j=i 


■  "p  "« 

J=1  /•=!  ■/=1 

"c  +  1  «p 

y=l  y=l  y  =  l  - 


(«£.  «p.  «„) 


which  becomes 


(67) 


2- 


("c  +  1>  "p>  "«)  ^  ("c-  «p>  "  J 


C 

T.., 


=  2; 


'K+1)  r.  .2  „(”c-«p.«„)-| 

sjn - >-»•.)  J-  <^8) 

j  =  I  j  =  I  j=l 


Similarly,  for  partially  categorized  and  uncategorized  data,  respectively, 

xf, 


(n^. /Ip +!,«„)  (/ic.«p.n«) 

^ii  ~  ^ii 


‘K+D  r,  .2  ^K.«p.««)l 

+»-n — HjTi - -s,,  J  (69) 

I  1 


y=i  J=i  j=i 


and 
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^ii  ^ii  «,  «p  «„+l  L'-  «„+l  a  J  ^  ’ 

j=l  7=1  7=1 


RESULTS 


SUMMARY  OF  UNIVARIATE  ITERATIVE  E-M  UPDATE  EQUATIONS 
For  completeness,  the  iterative  E-M  equations  are  reproduced  first. 


^im  n  M 


■jTi; - —  Sv+S^jX- 


ijm  ^  ij 
-j-\m  =  I  y  =  1 


r  "c  ~ 

1  c  p  u 

=  („  +„  +„ )  X ^.7+  X X '^ij  ’ 
'' c  j=l  j=l  J 


X  fe]  ^  X  K-/]  -  X  K-/] 


It-  =  — 


S  K]  +  S  X]  +  S  Kl 

■7=1  7=1  7=1 


S  <■  -  h)  +  E  (^r  - 1*.) 


2„  = 


- 


■7=1  7=1  7=1  ■ 


where 
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^ijm  g  M 


i=lm=l 


and 


(75) 


M 


^  C.  7C/(jC.;e.)  2]f 

ij  Ad  ijm  g  g  M 

i=l  z=lm=l 


(76) 


Similarly 


7C/(A:^.;e.) 


i=  1 


(77) 


where  it  is  to  be  remembered  that  x^.  is  only  computed  when  Xj  is  a  partially  categorized  observa¬ 
tion  and  similarly  for  t“.  . 

u 

The  E-M  algorithm  then  consists  of  iterating  the  expectation  step  consisting  of  evaluating 
Equations  (75),  (76),  and  (77)  for  the  appropriate  observations  and  the  maximization  step,  which 
consists  of  evaluating  new  parameter  values  using  Equations  (71)  through  (74). 

The  multivariate  versions  can  be  obtained  by  making  xj  and  p,,-  vector  quantities  and  2,-  a 
matrix.  The  equations  for  p,-  and  2,-  become 


=  i^J _ L=J _ /•=i 


2  ba  -  2  [<J  +  2  b“] 

M  =  1  j  =  I  j=l 


(78) 


and 
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.j=i  j=i  j=\ 


,  (79) 


where  the  component  indices  are  denoted  by  superscripts. 


SUMMARY  OF  UNIVARIATE  RECURSIVE  UPDATE  EQUATIONS 


Categorized  Observation 


.  (”c  +  1>  ”») 


c  r  _^K.V| 

’/m 

y=i  y=i 


(n^+ l,/ip,n„)  (n,,n^,n„) 


_ 1 _ r  c  _  < 

(”c  +  1  ("c  +  1) 


{n^,n  ,n„)- 


(n^+l,n^,n„)  _  (n^,  «„) 

M-/  -  + 


C 

''^Un.+  D 


+  1 


2 


y=i  ./=!  /=! 


,(n^+l,np,n„)  K>  ”?*"<<) 


‘■'K  +  l) 


r  2 


+  1  /!„ 


7=1  7=1  7=1 


For  the  multivariate  case.  Equations  (82)  and  (83)  become  (with  vector  indices  k  and  1) 
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fc(/i^+ !,«,«„)  k(n^,n  nj 


c 

1:. 


_  «=.(«,+ 1)  f  it  ft(«c.VV1 


j=l  7=1  7=1 


,*l("c+l- V"«) 


=  2: 


c 

X;7 


«c+i  «p  «„  Ll\+i“^^Jr«c+i“^J~^«  J 

y=i  ;=i  y=i 


(85) 


Partially  Categorized  Observation 


c 


(«c>«p+l) 


-  («c>  «p) 

sfm 


if 


«.  Wp  + 1 


n„ 


7=1  i=l  ■ 


fe  _>.K-«p)'] 


(86) 


(«c."p +!.««)  in^,n  n^) 

%i  =  n.  + 


(/i„/jp  +  l,nJ  _  K,np,n„) 
fij  -  + 


K  +  +  +1'- 

Tf, 


-'”'”^"1-  (S7) 


"«(«p+i)  r 

"c 

"p+i  «.  L 

(«C.  ”p-  "u) 


y=i  y=i  y=i 


(88) 


,  («<,,  «p  +  1,  n„) 


(«c>  «p>  «u) 


if. 


vi - -^.7  J 

7=1  7=1  7=1 


(89) 


For  the  multivariate  case,  Equations  (88)  and  (89)  become  (with  vector  indices  k  and  /) 
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k{n^,n+\,n^)  k(n^,n,n^) 

+ 


^•(np+1) 

«c 

«p+l 

- 

k{n^nj,,n^) 


] .  (90) 


y=l  7=1  7=1 


W(n,,/ip+l,n„)  _  A:/K,«p,n„) 


^(n^+l)  it  itV  I  A 

)=i  >=i  j=i 


-z. 


kl  (n^,n  ,/»„)■ 


(91) 


Uncategorized  Observation 


{n^,np,n^+l)  K,  Wp,  «„) 

7C,.  =  71,.  + 


(n^  +  n^  +  nj 


1  r  “  («p.«p.«u)i 

+  n  )  +  iL'^«K+1)"^-  J 


(92) 


("c>  "p’ "u  ri  («c>”p>”»l 

=  ^l.  ^  + 


^'K+D  r 

"c 

”p 

+ 1  L 

14 

w 

S^.J 

7=1 

7=1 

7=1 

(nc,n.,nj 


] .  (93) 


2.. 


u 

X,, 


=  2:,- 


•'/(n  +1)  r  2  K>v"«n 

....  J-  (9^) 

7=1  7=1  7=1 


For  the  multivariate  case.  Equations  (88)  and  (89)  become  (with  vector  indices  k  and  /) 


k(n^,n^,n^+l)  _  k{n^,n^,n^) 


'^'K  +  D  r* 

«p  «„+i  r«„+i  ^1  J’ 

7=1  7=1  7=1 


22 


NSWCDD/TR-95/122 


+  _  kl(n^,n^nj 

h  -  h 


(n„ + 1)  [ 

A 

«u  + 1  L 

;  =  i 

J=i 

y=i 

(96) 


CONCLUSIONS 


The  derivation  of  the  recursive  E-M  equations  for  joint  representation  mixture  models 
with  normal  components  has  been  presented.  While  the  detailed  derivation  was  for  the  univariate 
case,  a  slightly  more  complicated  derivation  results  in  the  full  multivariate  equations.  The  results 
for  this  case  have  been  presented  without  derivation. 

The  joint  representation  approach  represents  a  significant  philosophical  departure  from 
current  mixture  model  usage.  The  standard  mixture  model  usage  is  either  to  build  a  separate  mix¬ 
ture  model  for  each  class  when  the  observations  are  class  labeled,  or  to  assume  that  each  class  is 
normally  distributed  so  that  a  mixture  model  for  all  the  data  can  be  interpreted  as  a  mixture  of  nor¬ 
mal  classes.  This  approach,  in  effect,  totally  relaxes  the  requirement  for  each  class  to  be  normally 
distributed.  Philosophically,  a  semiparametric  viewpoint  has  been  taken  in  that  it  is  assumed  that 
each  class  can  be  modelled  by  a  (potentially  complex)  mixture  model  and  that  no  significance  is 
to  be  ascribed  to  an  individual  term  in  the  mixture.  As  an  example,  contrast  a  mixture  model 
approximation  to  a  lognormal  density  to  a  mixture  of  two  normals.  In  the  latter  case,  it  may  well 
make  sense  to  care  about  which  of  the  two  terms  gave  rise  to  a  particular  observation.  However,  in 
the  lognormal  case,  where  by  assumption  the  density  is  nonparametric  with  respect  to  representa¬ 
tion  by  normal  mixtures,  this  sort  of  distinction  has  little  or  no  meaning. 

This  approach  is  thus  appropriate  for  combined  supervised/unsupervised  (various  levels  of 
class  categorization)  learning  when  the  individual  class  densities  may  be  more  complex  than  sim¬ 
ple  normals.  It  provides  a  unified  framework  for  handling  this  problem.  Once  the  joint  representa¬ 
tion  density  has  been  estimated,  densities  corresponding  to  the  individual  classes  can  be  easily 
recovered. 

The  recursive  versions  of  these  equations  allow  this  approach  to  be  used  for  large  data  sets 
as  well  as  in  an  adaptive  mixture  model  framework. 
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